Probability theory

part I

Eva Freyhult

NBIS, SciLifeLab

April 7, 2025

Probability

Probability describes how likely an event, \(E\), is to happen.

Example events:

Get an even number when rolling a dice.

A mutation in a gene.

Catch a cold.

The sun rises in the morning.

Probability

Probability describes how likely an event, \(E\), is to happen.

\[0 \leq P(E) \leq 1\]

A probability is always between 0 and 1, where 1 means that the event always happens, and 0 that it never happens.

Probability

Probability describes how likely an event, \(E\), is to happen.

  1. \(0 \leq P(E) \leq 1\)
  2. \(P(S) = 1\)

The total probability of all possible events is always 1.

The sample space, \(S\), is the set of all possible events.

Probability

Probability describes how likely an event, \(E\), is to happen.

  1. \(0 \leq P(E) \leq 1\)
  2. \(P(S) = 1\)
  3. If \(E\), \(F\) are disjoint events, then \(P(E \cup F) = P(E) + P(F)\)

The probability of two disjoint (non overlapping) events, is the sum of the probability of each event separately.

Probability

Probability describes how likely an event, \(E\), is to happen.

Axioms of probability

  1. \(0 \leq P(E) \leq 1\)
  2. \(P(S) = 1\)
  3. If \(E\), \(F\) are disjoint events, then \(P(E \cup F) = P(E) + P(F)\)

Common rules of probability

Based on the axioms the following rules of probability can be proved.

  • Complement rule: let \(E'\) be the complement of \(E\), then \(P(E') = 1 - P(E)\)
  • Impossible event: \(P(\emptyset)=0\)
  • Probability of a subset: If \(E \subseteq F\) then \(P(F) \geq P(E)\)
  • Addition rule: \(P(E \cup F) = P(E) + P(F) - P(E \cap F)\)

The urn model

 

A fair coin

Age

Pollen allergy

 

By drawing balls from the urn with (or without) replacement probabilities and other properties of the model can be inferred.

Random variables

A random variable describes the outcome of a random experiment.

  • The weight of a random newborn baby, \(W\), \(P(W>4.0kg)\)
  • The smoking status of a random mother, \(S\), \(P(S=1)\)
  • The hemoglobin concentration in blood, \(Hb\), \(P(Hb<125 g/L)\)
  • The number of mutations in a gene, \(M\)
  • BMI of a random man, \(B\)
  • Weight status of a random man (underweight, normal weight, overweight, obese), \(W\)
  • The result of throwing a dice, \(X\)

Random variables

A random variable describes the outcome of a random experiment.

  • Random variables: \(X, Y, Z, \dots\), in general denoted by a capital letter.

  • Probability: \(P(X=5)\), \(P(Z>0.34)\), \(P(W \geq 3.5 | S = 1)\)

  • Observations of the random variable, \(x, y, z, \dots\)

  • The sample space is the collection of all possible observation values.

  • The population is the collection of all possible observations.

  • A sample is a subset of the population.

Discrete random variables

A categorical random variable has nominal or ordinal outcomes such as; {red, blue, green} or {tiny, small, average, large, huge}.

A discrete random number has countable number of outcome values, such as {1,2,3,4,5,6}; {0,2,4,6,8} or all integers.

A discrete or categorical random variable can be described by its probability mass function (PMF).

The probability that the random variable, \(X\), takes the value \(x\) is denoted \(P(X=x) = p(x)\).

Example: a fair six-sided dice

 

 

Possible outcomes: \(\{1, 2, 3, 4, 5, 6\}\)

Example: a fair six-sided dice

The probability mass function;

x 1 2 3 4 5 6
p(x) 0.167 0.167 0.167 0.167 0.167 0.167

Why a fair sided dice?

  • Can be used to simulate a random outcome.
  • One or several sides can be choosen to represent a successful outcome.
  • By altering the number of sides the probability of success can be set.

  • In life science/your work, what could a dice model represent?
  • Group discussion 2 min!

Example: Nucleotide at a given site

Table 1: Probability mass function of a nucleotide site.
x A C T G
p(x) 0.4 0.2 0.1 0.3

Example: Number of bacterial colonies

Expected value

The expected value is the average outcome of a random variable over many trials and is denoted \(E[X]\) or \(\mu\).

When the probability mass function is know, \(E(X)\) can be computed as follow;

\[E[X] = \mu = \sum_{i=1}^n x_i p(x_i),\] where \(n\) is the number of outcomes.

Alternatively, \(E(X)\) can be computed as the population mean, by summing over all \(N\) objects in the population;

\[E[X] = \mu = \frac{1}{N}\sum_{i=1}^N x_i\]

Variance

The variance is a measure of spread and is defined as the expected value of the squared distance from the population mean;

\[var(X) = \sigma^2 = E[(X-\mu)^2] = \sum_{i=1}^n (x_i-\mu)^2 p(x_i)\]

Standard deviation

The standard deviation is the square root of the variance and is usually denoted \(\sigma\)

\[\sigma = \sqrt{E[(X-\mu)^2]} = \sqrt{\sum_{i=1}^n (x_i-\mu)^2 p(x_i)}\] or by summing over all objects in the population;

\[\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i-\mu)^2}\]

The standard deviation is always positive and on the same scale as the outcome values.

Simulate distributions

Once a random variable’s probability mass fuction is known, properties of interest can be computed, such as;

  • probabilities, e.g. \(P(X=a), P(X<a)\) and \(P(X \geq a)\)
  • expected value, \(E(X)\)
  • variance, \(\sigma^2\)
  • standard deviation, \(\sigma\)

If the distribution is not known, simulation might be the solution.

Simulate distributions

When rolling a single dice the probabity of six is 1/6.

The outcome of a single dice roll is a random variable, \(X\), that can be described using an urn model.

Simulate distributions

When rolling 10 dice, how many sixes do you get?

  • What are the possible outcomes?
  • What is the probability of exactly 2 sixes?
  • What is the probability of at least 5 sixes?
  • What does the probability mass function look like?
  • What is the expected number of sixes?

Simulation in R!

Parametric discrete distributions

  • Uniform
  • Bernoilli
  • Binomial
  • Poisson
  • Negative binomial
  • Geometric
  • Hypergeometric

Uniform

In a uniform distribution every possible outcome has the same probability.

With \(n\) different outcomes, the probability for each outcome is \(1/n\).

Examples

  • In randomized sampling each individual/object in the population has the same probability of being selected.
  • In randomized clinical trial assignment to treatment group often follows a uniform distribution.

Bernoulli

A Bernoulli trial is a random experiment with two outcomes; success (1) and failure (0). The outcome of a Bernoulli trial is a discrete random variable, \(X\).

\[P(X=x) = p(x) = \left\{ \begin{array}{ll} p & \mathrm{if}\,x=1\mathrm,\,success\\ 1-p & \mathrm{if}\,x=0\mathrm,\,failure \end{array} \right.\]

Using the definitions of expected value and variance it can be shown that;

\[E[X] = p\\ var(X) = p(1-p)\]

Examples

Experiments where the outcome is binary, such as; healthy/sick, dead/alive, success/failure, mutated/not mutated etc.

Binomial

The number of successes in a series of \(n\) independent and identical Bernoulli trials (\(Z_i\), with probability \(p\) for success) is a discrete random variable, \(X\).

\[X = \sum_{i=0}^n Z_i,\] The probability mass function of \(X\), called the binomial distribution, is

\[P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}\] The expected value and variance;

\[E[X] = np\\ var(X) = np(1-p)\]

Examples

  • The number of patients responding to a treatment in a study.
  • The number of patients experiencing a side effect in a study.

Hypergeometric

The hypergeometric distribution describe the number of successes in a series of \(n\) draws without replacement from a population of size \(N\) with \(Np\) objects of interest (successes).

The probability density function

\[P(X=k) = \frac{\binom{Np}{k}\binom{N-Np}{n-k}}{\binom{N}{n}}\]

Examples

  • Select a subset of individuals from a (small) population of individuals, what is the probability that \(x\) of them arre allergic to pollen?
  • Gene set enrichment analysis - identification of gene sets that are overrepresented in a set of genes of interest.
  • In population genetics to describe the number of individuals with a certain genotype in a population.

Poisson

The Poisson distribution describes the number of times a rare event (probability \(p\)) occurs in a large number (\(n\)) of trials.

The probability mass function;

\[P(X=k) = \frac{\lambda}{k!}e^{-\lambda},\]

\[E[X] = var(X) = \lambda = n p\]

Examples

  • A rare disease has a very low probability for a single individual. The number of individuals in a large population that catch the disease in a certain time period can be modelled using the Poisson distribution.
  • In RNAseq analysis the Poisson distribution is frequently used to model read counts, in particular when gene expression levels are low.

The Poisson distribution can approximate the binomial distribution if \(n\) is large and \(p\) is small, rule of thump \(n>20, p<0.05, np < 10\).

Negative binomial

A negative binomial distribution describes the number of failures that occur before a specified number of successes (\(r\)) has occurred, in a sequence of independent and identically distributed Bernoilli trials.

\(r\) is also called the dispersion parameter.

Examples

  • In epidemiology the number of days (weeks) of no cases before a certain number of new cases is reported.
  • In RNAseq data analysis negative binomial distribution is commonly used to model read counts.

Geometric

The geometric distribution is a special case of the negative binomial distribution, where \(r=1\).

Example PMFs

Figure 1: Probability mass functions for the binomial distribution (n=20, p=0.1, 0.3 or 0.5), hypergeometric distribution (N=100, n=20, p=0.1, 0.3 or 0.5), negative binomial distribution (n=20, r=n*p, p=0.1, 0.3 or 0.5) and Poisson distribution (n=20, p=0.1, 0.3 or 0.5).

In R

Probability mass functions, \(P(X=x)\); dbinom, dhyper, dpois, dnbinom and dgeom.

Cumulative distribution functions, \(P(X \leq x)\); pbinom, phyper, ppois, pnbinom and pgeom.

Also, functions for computing an \(x\) such that \(P(X \leq x) = q\), where \(q\) is a probability of interest are available using; qbinom, qhyper, qpois, qnbinom and qgeom.